Best Topic Word Selection for Topic Labelling

نویسندگان

  • Jey Han Lau
  • David Newman
  • Sarvnaz Karimi
  • Timothy Baldwin
چکیده

This paper presents the novel task of best topic word selection, that is the selection of the topic word that is the best label for a given topic, as a means of enhancing the interpretation and visualisation of topic models. We propose a number of features intended to capture the best topic word, and show that, in combination as inputs to a reranking model, we are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word. This is the case both when training in-domain over other labelled topics for that topic model, and cross-domain, using only labellings from independent topic models learned over document collections from different domains and genres.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Kou, Wanqiu, Li Fang and Timothy Baldwin (to appear) Automatic Labelling of Topic Models using Word Vectors and Letter Trigram Vectors, in Proceedings of the Eleventh Asian Information Retrieval Societies Conference (AIRS 2015), Brisbane, Australia

The native representation of LDA-style topics is a multinomial distributions over words, which can be time-consuming to interpret directly. As an alternative representation, automatic labelling has been shown to help readers interpret the topics more efficiently. We propose a novel framework for topic labelling using word vectors and letter trigram vectors. We generate labels automatically and ...

متن کامل

Automatic Labelling of Topic Models Using Word Vectors and Letter Trigram Vectors

The native representation of LDA-style topics is a multinomial distributions over words, but automatic labelling of such topics has been shown to help readers interpret the topics better. We propose a novel framework for topic labelling using word vectors and letter trigram vectors. We generate labels automatically and propose automatic and human evaluations of our method. First, we use a chunk...

متن کامل

اولویت‌بندی معیارهای انتخاب موضوع پایان‌نامه با روش تحلیل سلسله مراتبی (AHP) از دیدگاه دانشجویان دکتری

Background and Aim: Choosing thesis topic is one of the most important decisions of postgraduate students and many factors affect such decision. This study aimed to prioritize the criteria for choosing thesis topic from Ph.D. students’ viewpoint, using the analytic hierarchy process (AHP) and ranking methods. Materials and Methods: This analytical study was carried out on the School of Public ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010